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To: J, Clancy/ C. Mundie, S. Schleimer, ft. Slack/ 

R. Belgard/ J. Dooda/ R. Gruner/ S. Redfield, S. Wallach 

From! J. Ahlstrom, M. Druke/ ft. ftallach 

MEMO 196 MARCH 12, 1977 

Subject: SUMMARY MIX AND FUNCTION BY MODULES FOR 

COBOL, FORTRAN, SPL, QSkernel 

Note: The following mix weights are totally SftAG 

MIX SUMMARY 



COMMERCIAL INSTALLATIONS 

The standard mix for commercial installations is guessed to be: 
70% Cobol object programs 

30% Spl compilers, data base, debuggers, OS 

NUMERICAL INSTALLATIONS 

The standard mix for numerical installations is guessed to be: 
60% Fortran object programs 

40% Spl compilers, OS, debuggers, editors 

MIXED INSTALLATIONS 

Much more variability in mix can be expected from mixed 
installations than from exclusively commercial or numerical ones. This 
mix is presented for a mythical "perfectly balanced" mixed installation: 

33% Cobol 
33% Fortran 
34 % Spl 



CONTRIBUTIONS TO PERFORMANCE BY OPERATION BY LANGUAGE 



COMMERCIAL 


NUMERICAL 


MIXED 


COBOL 








addpacked 


4% 




2% 


cmprpacked 


7% 




3% 


cmprdi spl ay 


30% 




15% 


movechars 


20% 




10% 


adddi spl ay 


4% 




2% 


mpypac ked 


2% 




1% 


FORTRAN 








add floating 




11% 


6% 


index add 




11% 


6% 


cmpr&branch 




10% 


5% 


incl do update 






move 




IX 


3% 


mpy f 1 oat i ng 




5% 


3% 


i ndi rect i on 




4% 


2% 


go to (unconditional) 


3% 


2% 


format edit 




2% 


1% 


radix convert 




2% 


1% 


SPL 








move 


15% 


18% 


14% 


goto 


8% 


10% 


IX 


call 


5% 


IX 


4% 




compare&branch 


9% 


12% 


8% 


bi ttest&branch 


5% 


6% 


«% 


ar i t hmet i c 


3% 


5% 


3% 



Nodule oriented functions 



(file can abstract from these language-oriented operations to module- 
oriented functions producing the following breakdown of what JP modules 
must be able to do well to produce competitive machines! 

PARSE 

Deliver cannonical operand specifiers to FETCH at the rate of 
one per cycle. Note that this is not possible for S P L # C080L 
and PL/I when operand lengths are specified by structured 
literals. This argues for longer fixed length literals for 
Cobol and PL/I. 

Completely prccess unconditional jumps invisibly to other units. 

Prefetch both targets of a conditional branch waiting for the 
condition to be resolved only to decide which to process. 

The parse's relation to exception handling: 

external interrupts* 
s_op dependent faults* 
machine checks 

is yet to be specified (TBS). 



FETCH 

Accept cannonical operand specification* generate and pass to 
cache A00 and fetch length* modify remaining length and address 
and specify its own next nano instruction address in 1 cycle. 
Inhere fetch length is: 

the minimum of JPD-bus width and (remaining) operand length. 

tohen the amount of data to be fetched! s less than one JPD-width 
specify justification* extension and fill characteristics. 

For multiple JPD-width operand fetches* if the length is not yet 
exhausted and the condition* if any* is not yet detemined by the 
execute box* send A00 and length to cache* modify length and 
address and specify own nextnano instruction in one cycle. 
Specify j ust i f i ca i on * extension* and fill for short lengths. 

Handle compiler detected or user specified array operations* 
to fetch and store elements of vectors that are being opeated 
on as aggregates rather than single elements. 

Abort mu 1 1 i -JPDB-w i dt h fetches when execute has already 
determined result of comparison. 

Handle overlapping strings when compilers cannot or do not 
handle them. 

Exception handling TBS. 

INTERPRETER 

Extract and insert arbitrary fields in arbitrary length 
operands , 




Access known structures through physical addresses. 

Generate memory addresses to chase linked data structures. 

Generate own next nano address based on extracted fields and 
several staticised bi t s--perhaps 16 to 64 way CASES. 

Exception handling TBS. 

EXECUTE 

Packed decimal arithmetic and comparisons including digit 
va 1 i di ty . 

Display comparisons including weird sign conventions. 

Packing, unpacking and editing including digit validity checks. 
Overflow on 32 bit stores, ano 64 bit calculations. 

Binary comparisons signed and unsigned. 

Conversion from binary to decimal radices. 

Floating point arithmetic. 

Fixed point arithmetic. 

Exception handling TBS. 




APPENDIX IS SPL OPERATIONS AND OPERANDS 



OBJECTIVE 

To characterize system programming fundamental operations the 
.efficient execution of which is essential to SPL program performance. 



OBSERVATIONS 

Burroughs uses a very PL/N like language called SDL as the 
implementation language for the B1700/1800 systems. Though SPL is 
different from both these languages the problems it will be called on to 
solve are similar. It can be expected that SPL programs will use many 
more st range-1 ength variables than SDL/ and more subscripting. 
Additionally/ the Slanguages are ph i 1 osoph i ca 1 1 y qui t e different 
between SPL and SDL. This study of DYNAMIC execution characteristics of 
the B1700 MCP can only characterise the kinds of source language 
operations that systems programming languages specify--not the details 
of the actual s_ops that SPL will execute. Two dynamic traces of 
approximately 4/000/000 s-ops each (obtained by tracing the 1700 MCP) 
agree quite well in their frequencies. One study is multiprogramming a 
number of jobs in ample memory with memory management activity 
essentially limited to changes in their working sets. The second is 
thrashing in less memory with the MCP spending a substantial proportion 
(11%) of its time executing the s-op that searches through memory 
looking for available space. 



% of 


oper at i ons 


executed when 


Operation not 


thrashing 


thrashing 


move 


10.6 


11.3 


ari thmet i c 


4.23 


4.83 


not 


2.00 


1.80 


add 


1.15 


1.44 


sub 


.84 


1.23 


compar i son 


4.69 


5.04 


eql 


2.58 


3.00 


neq 


1.32 


1.14 


gt r 


.48 


.53 


bool ean 


1.20 


1.38 


program cnt r 1 


18.22 


17.22 


condi t i onal 


7.07 


6.80 


i ft hen 


4.25 


4.47 


i f e 1 se 


1.61 


1.27 


1 eavec 


1.21 


1.06 conditionally leave block 


uncondi t i onal 


11.15 


10.42 


call 


4.93 


3.72 


1 eave 


2.46 


3.00 unconditionally leave 


return 


1.50 


1.47 function value return 


cycle 


1.12 


.82 next iteration 


exit 


.71 


.92 procedure return 


case 


.43 


.49 


construe t 


<8.00 


<8.00 


parameter and 


local variable packets 


(S.language architectural 
env i ronment . ) 


overhead caused by processing 


load address 


<50.00 


< 50.00 but not much < 



or value on stack (S_l angugage overhead...) 




Like SPL SDL allows the specificaion of variable length integers 
and bit strings as well as character strings. For data that are NOT 
incuded in structures (records) this facility is little used in SDL 
partly because it is fewer keystrokes to specify a full 24 bit integer 
and partly because there is no space saving in specifying stack-frame 
Variables of less than 24 bits rather than one of 24 bits. Whether it 
is used in SPL I suspect will be more a matter of management than of 
technology* if it is as easy to specify the exact interval of a variable 
rather than some standard or default interval then that will be done. 

In 1*556*823 references to variables not in structures (implying 
approximately 2*500*000 references to variables in structures) in the 
1700 MCP's dynamic trace* the distribution of lengths with non-zero 
frequencies is: 



bi t 

1 engt h 



1 

2 

3 

4 

5 

6 



8 

12 

16 

20 

24uns i gned 
24s i gned 



reference 
f requency 
76 * 385 V 
798 \ 
3,766 91 
353 
268 
627 
22*629 
2,079 
1*588 
597 
285 

1,323*423 

124,025 



reference 

% 

5 



1 size of i/o channel field 
- not including 1 char strings 



85 these are the two lengths that 
Scan be specified without thougt 



£PL will much more strongly encourage the declaration and use of strange 
width non-st ructured variables than SDL does* thus* making the numbers 
in this table only representative of languages that allow this facility* 
not at all typical of the lengths we will actually encounter in SPL. 
References to variables in structures will 'always' be to 'strange' 

1 engt hs . 



To the extent that SPL programs are similar to the B1700 ^CP* 
they will exhibit the following characteristics: 



28% stores 

30% unconditional transfers of control 
12.5% cal 1 

30% conditional branching 40% conditions true 

20% requiring comparison 
10% bit testing only 
8% ari thmet i c 

To the extent that SPL has explict semanticaly rich operations for 
functions that must be composed out of SDL s-ops* these % will be 
reduced--par t i cu 1 a r 1 y the program control ones. 




APPENDIX II: CORE.FQRTRAN 



ABSTRACT: 

A study of Fortran performance 'is undertaken by analyzing 
recent publications* resulting in a dynamic mix of Fortran primitives. 

This memo is an attempt to identify the "core" 
operations which must be executed efficiently by or machine to insure 
competitive Fortran performance. The numbers in this report were 
deciphered from various inputs including: 

1) a large static and a small dynamic analysis of 
Fortran programs done by Knuth at Stanford* 

2) two static studies which Robinson and Torsu reported 
in the British Computer Journal* and 

3) a static and dynamic analysis of Algol performed by 

Wichmann, 



The algorithm used to combine these inputs and derive 
the dynamic mix was roughly: 

1. Determine the static distribution of the 8 most 
frequently occurring executeable statements. 



2. Using the dynamic study as a basis* infer a 
dynamic distribution of statement occurrences. 

3. Determine the types and distribution of primitive 
operations that each statement could compile into. 

4. Combining the results of 2 and 3* produce a dynamic 

mix. 



Each step in this procedure adds to the error already present 
in the inputs* resulting in an uncomfortably low confidence factor 
in the final conclusions. However* I believe this algorithm is the 
best technique available to produce these results* when new data 
and more informed intuition are obtained* further iterations 
of this algorithm should converge on a "correct " mi x . In order 
to identify the areas where errors could be introduced* each assump- 
tion that was made is recorded* any refinements or second opinions 
would be very useful in producing a better iteration of this mix. 



STATIC DISTRUBUTIQNS 

The static frequency of occurence of the 8 most commmon executeable 
statements occuring in the sample programs are enumerated in the 
following table. These numbers are normalized to reflect true 
percentages of executable statements* i.e. those statements which only 
affect compilation are removed (e ,g. CONTINUE* DIMENSION* END* etc.). 
Each study provides data on two sample sets: 

Knuth presents results of a huge sample of programs written at Lockheed 
Corp.f as well as a much smaller set written by students at Stanford. 

The British Computer Journal article (B CJ) reports on a "system" and a 
"student" sample. 

It is interesting to note that the two "commercial" (i.e. Knuth's 
Lockheed and BC J's system) samples agree much better than the student 
samples. This is somewhat reassuring since these samples will surely be 




more similar to the typical Fortran programs 

written on our machine than the "toys" (as John Pilat calls them) 
written by the students. 

Consequently# when computing the average percentage of each 
Btaement# the commecial saples were weighted 3:1 over the student 
samples. Logical IF's# i.e. (IF ( ,cond_expr . ) "statement are treated 
as two statements# one IF and one "statement". 



KNUTH BC J WEIGHTED 





LOCKHEED 


STUDENT 


SYSTEM 


STUDENT 


AVERAGE 


Ass i gnment 


46.0 


60,1 


48.1 


50.3 


49.1 


IF 


16.3 


10.0 


16.7 


11.2 


15.0 


GOTO 


14.6 


9.4 


13.1 


12.6 


13.1 


DO 


4.5 


5.9 


5.2 


7.8 


5.4 


CALL 


9.0 


4.7 


4.3 


4.0 


6.1 


RETURN 


2.2 


2.4 


2.0 


2.8 


2.2 


WRITE 


4.5 


5.9 


8.5 


7.9 


6.6 


READ 


.3 


1.2 


1.3 


2.3 


1.0 


DYNAMIC DISTRIBUTIONS 


The only 


expl i c i t 


dynamic information 


avai 1 abl e resul ts 


from tests performed 


by Knuth 


on his student 


"toys 


" . Other 


tidbits 


pi information can be 


i nf erred 


from various 


data# 


but more 


reliable 



numbers cannot be assembled without more inputs. Knuth's dynamic 
data is summarized in the following table# also depicted is an attempt 
to determine a more accurate dynamic mix by assuming that the 
dynamic/static ratio is invariant# therefore allowing a normalized 
dynamic average to be computed from the static averages. It is 
important to note that these dynamic distributions are not weighted. 

by estimated execution times. Such a transformation would defeat the 
purpose of this exercise# which is to determine which operations provide 
more "leverage"# i.e, to determine which operations# when accelerated# 
contribute most to an overall increase in Fortran performance. 

Knuth Knuth dyn/ ave. normalized 



Statement 


Static 


Dynami c 


static 


Static 


computed 

Dyn 


Ass i gnment 


60.1 


64.4 


1.1 


49.1 


56.6 


IF 


10.0 


10.5 




15.0 


17.3 


GOTO 


9.4 


8.6 


.9 


13.1 


12.4 


DO 


5.9 


9.6 


1.6 


5.4 


9.0 


* CALL 


4.7 


2.9 


.7 


6.1 


3.2 


* RETURN 


2.4 


2.9 


1.3 


2.2 


- 


WRITE 


5.9 


1.0 


.2 


6.6 


1.3 


READ 


1.2 


0.0 


0.0 


1.0 


0.0 


* Of course# 


the dynamic frequencies of CALL & RETURN must be 



equal# therefore# in computing the normalized computed dynamic 
frequency they were combined as a single dynamic statement 




whose frequency is assumed to be the average of the two results 
of multiplying the static frequencies by their dynamic/static 
ratios. The other frequencies were adjusted to reflect this 
merger. 

Statement breakdowns 

In this section each of the eight statements are analyzed 
in detail to determine a plausable mix of primitive operations that 
each statement could compile into. Architectural overhead loads and 
stores are assumed to be nonexistent since the S-ops executing these 
common statements will surely be semantically rich. 



ASSIGNMENT 

All the studies provide information about the relative occurence 
of operators within assignment statements, from which the following 
distribution of operators is derived (note; add includes sub): 



add 


60% 


mpy 


26% 


d j v 


8% 


1 i brary f unct s 


4% 


user functs 


2% 



The problem then reduces to determining the average number 
of operators per assignment statement. The answer was obtained 
by making two approximations; 

1) 45% of all dynamic occurences of assignments are moves and 

2) in the remaining 55%, there is an average of 2 operators per 
expression. This results in the conclusion that the 
average assignment statement is executed as: 



move .45 
add .66 
mpy .29 
d i v .09 
library functions .04 
user functions .02 



The next primitive operations resultjng from 
assignment statements are index manipulations. The first bit of 
information necessary is the following distribution of subscripts among 
variables: 

0 63% 

1 25% 

2 10 % 

>2 2 % 

Assuming reasonble compiler optimization we can, perhaps a 
little optimistically, assume that all s i ngl y-subsc r i pted variables 
Irequire no index arithmetic, all doubl y-subsc r i pted var i abl es requi re an 
index add, and all variables with more than 2 subscripts require and 
index multiply and an index add. This, together with an assumed average 
of 2.5 variables per assignment, result in the conclusion that the 
average assignment statement will require .30 index adds and .05 index 
mpy 's . 




Variables that are arguments to or results from a called 
function are referenced i ndi recti y f th i s overhead should also be 
computed. However* since these indirect references occur whenever a 
CALL occurs* this analysis is postponed until the section on CALL. 



IF 

The two classes of IF statements* ar'i thmetic and logical* must 
be analyzed seperately. Logical IP'S* which comprise approximately 
70% of all IP's* are straightforward* each compile into a simple 
compare&branch operation. Arithmetic IP's* however, contain an 
arithmetic expression as well as three possible branch addresses. 

The three address question was resolved by assuming that 10% of all 
arithmetic IP's (3% of all IP's) specify three different address and 
therefore require an additienal compare&branch. The expressions within 
an arithmetic IF were assumed to be comparable to those in assignments. 
All this results in the following conclusions about the average IF: 

1.03 compare&branch 
.20 add 
,08 index add 

.08 mpy 

.03 div 

.02 index mpy 

,01 1 ibrary funct ion. 

DO 

The DO statement is executed twice* once for loop setup and 
egain for loop iteration. The average loop was assumed to be executed 
10 times* requiring the loop setup operation frequencies to be 
attenuated by a factor of ten. DO loop i terat i on requi res an index add 
and an index comp&branch. Although there is a difference between an 
inoex comp&branch and the IF comp&branch* (the loop count is incremented 
as a side effect) they are similar enough to be treated as the same 
primitive operation in the mix. The complexity of the DO loop setup 
depends on whether the loop increment is the default of one (95%) or 
some specified value (5%). If the increment is one* the loop count can 
be determined by a simple subtraction* the entire loop setup is a move 
and an index add(sub). If* on the other hand* the increment is not one* 
an additional index add and index divide is necessary to compute the 
count. This results in the average DO statement being executed as: 

1.1 index add 

1.0 compare&branch 
.1 move 

,005 index divide 



GO TO 

The GOTO is the simplest of the statements. Except for the 
totally non-occuring assigned GOTO (0%) and the very infrequent computed 
GOTO ( 1%) * the GOTO maps directly into a branch. In fact* since 50% of 
all GQTQ's occur in logical IF's* they compile into a conditional branch 
which has already been counted in the IP analysis. Therefore the 
average GOTO statement is executed as: 

. «9-goto( uncon di t i onal ) ; 

,01 compted go to. 




CALL/RETURN 



The CALL/RETURN pair is straightforward to analyze, It expands 
into a state save* a state restore* and two unconditional branches. 

In addition* arguments and results are passed using pointers in 
the stack. Therefore the overhead of indirect references are associated 
with CALL/RETURN, The assumption was made that there* are on the 
average* 5 indirect references per CALL. Therefore the average 
C ALL/RETURN pair is executed ass 

1-state save* 

1- state restore* 

2- unconditional go to* 

5-i ndi rec t i ons , 



WRITE 

Although WRITE occurs roughly 1% of the time* it has been observed that 
it actually consumes 25-50% of execution time. This is caused by two 
factors ; 

1) The WRITE statement could contain an "implied DO" or a list 
of variables to be written* therefore the average WRITE statement really 
involves multiple WRITE's. The assumption was made that the average 
WRITE executes 7 times. There is a tremendous deviation here because an 
instance of a WRITE could specify a single variable or a 100X100 matrix. 

2) the data to be written must be converted f rom binary to 
decimal and edited according to a format specification. These 
"primitive" operations are quite complex and time-consuming* causing the 
typical WRITE dynamic execution weight to be much higher than the other 
statements. This is a fundamental problem with this type of analysis* 
the fact that some operation occurs .1% of the time is not enough 
information to discount it* if it takes 100 times as long to execute as 
another statement occuring 10% of the time it is of equal significance^ 

Therefore the following mix for write is computed: 

7-format edit* 

7-radix convert 
7-index add 
7-compare&branch 
1-interdomain call to write. 



READ 

Since READ occurs very infrequently it is not handled in detail 
also it is very similar to WRITE* and acceleration of formatting and 
radix conversion should be bidirectional. 




DYNAMIC MIX 



This section contains the final results of this study? the 
conclusions of sections 2&3 are combined to produce a SWAG Fortran mix. 



STATEMENT SUMMARY 

Statement dynamic weight primitive op freq. weighted freq 



Ass i gnment 


.57 


add 


.66 


.38 






move 


.45 


.26 






ndx add 


.30 


.17 






mpy 


.29 


.16 






di v 


.09 


.05 






ndx mpy 


.05 


.03 






1 i b . f un . 


.04 


.02 






user_f un . 


.02 


.01 


IF 


.17 


comp&branch 


1.03 


.18 






add 


.20 


.03 






ndx add 


.08 


.01 






my 1 


.08 


.01 






di v 


.03 


.00 






ndx mul 


.02 


.00 






lib. fun. 


.01 


.00 


GOTO 


.12 


goto(uncond. ) 


.49 


.06 






case 


.01 


.00 


DO 


.09 


ndx add 


1.1 


.10 






comp&branch 


1.0 


.09 






move 


.1 


.01 


CALL/RETURN 


.03 


state save 


1.0 


.03 






state restore 


1.0 


.03 






goto (uncond. ) 


2.0 


.06 






i ndi rect i on 


5.0 


.15 


WRITE 


.01 


format edit 


7.0 


.07 






radix conv. 


7.0 


.07 






ndx add 


7.0 


.07 






comp&branch 


7.0 


.07 






I/O directive 


1.0 


.01 




primitive 



add 

ndx add 
comp&branch 
move 
m u 1 

i ndi rect i on 
gotoCuncond. ) 
format edit 
radix conv. 
di v 

ndx mul 
lib. fun, 
user fun. 

I/O di rec t i ve 
case 



DYNAMIC MIX 
weighted freq. 



.41 

.35 

.34 

.27 

.17 

.15 

.12 

.07 

.07 

.05 

.03 

.02 

.01 

.01 

<.01 



normal i zed freq. 



.20 

.17 

.16 

.13 

.08 

.07 

.06 

.03 

.03 

.02 

.01 

.01 

<.01 

<.01 

<<.01 




APPENDIX HI! CRUCIAL COBOL OPERATIONS 



STUDY Z A STATIC AND DYNAMIC STUDY OF COBOL SOURCE ELEMENT FREQUENCIES 



This study 


shows static and 


dynamic occurrence of Cobol 


verbs and thei 


bperands for 9,000 


,000 Cobol 


verb executions of a 15000 


verb program. 


According 


to this 


study the 


dynamic distribution of verbs is: 


static 


dynami c 


rat i o d: 


s verb 




26 


42.7 


1.7 


IF 




33 


25.6 


.75 


MOVE 




20 


12.0 


.6 


GO TO (conditional and 


uncondi t i onal ) 


a. e 


9.5 


2 


ADD 




.55 


2.2 


4 


MPY 




.57 


2.1 


4 


SUB 




6.5 


1.5 


.22 


PERFORM 




.26 


.4 


1.5 


DIV 





The strong disparity of static and dynamic frequencies and the 
interchange of the 1st and 2nd most frequent verbs confirms my 
prejudices against static stuaies. 



The dynami, c distribution of operands by 
executions and as % of all executions of 
(where bin is subscript* exd is display* 


verb 
this 
pc k 


(as % of all verb 
verb) is 

is packed* lit i s^J i t e r a 1 ) 


verb 






% 


of all 


% of 


verb 


ADD 


bin* 


b i n 




.66 


7.0 


bin probably is a local 




exd* 


exd 




2.9 


30.6 


equivalent of pck and 




exd* 


pck 




.38 


4.1 


will be so considered. 




pck* 


pck 




.04 


.4 






lit* 


b i n 




3.1 


33.1 


if bin is real 1 y the 




lit* 


exd 




1.22 


12.8 


equivalent of index 




lit* 


pck 




. 1 


1.1 


rather than packed 




exd* 


pck* 


b i n 


.67 


7.1 


or sometimes one or the 




lit* 


pck* 


bi n 


.16 


1.7 


other we are misled. 


if bin 


is assumed 


to be 


pck these 


percentages change to become 




pck* 


pck 




.7 


7.4 






lit* 


pc k 




3.2 


34.2 





Four accelerated S-ops: 

add display to display 
increment packed by literal 
increment display by literal 





add 


packed 


1 to 


pac ked 




would 


account for 


7% i 


of all executed 


DIV 


exd* 


exd* 


exd 


.05 


12.9 




exd* 


lit* 


exd 


.04 


10.9 




pck* 


exd* 


pck 


.21 


54.4 




lit* 


exd* 


exd 


.85 


21.8 


IF 


x f 


X 




3.0 


7.0 




x * 


lit 




11.3 


26.5 




bin* 


1 i t 




6.0 


14.0 




exd* 


exd 




.4 


1.1 




exd* 


1 i t 




5.4 


12.7 




x * 


x • 


b i n 


1.7 


4.1 




x * 


bin* 


1 i t 


1.2 


2.8 



2.9 

3.2 

1.2 
.7 

Cobol i nst ruct ions. 





X # X # 


X 9 


x 1.6 


3.8 






lit» x # 


X 9 


x 1.5 


3.5 






exd# 1 i 1 # 


X 9 


x .47 


1.1 






1 i 1 # x # 


lit. 


x 2.3 


5.5 






Four accelerated 


compare and branch instructions: 


compare display to 


display# 






6.3 


compare display to 


1 i t e r a 1 # 






12.8 


compare packed to literal# 






6.0 


compare display numeric to 


1 i teral 




5.4 


would account for 


30% of all dynamically 


executed Cobol verbs. 


MOVE x# x 




6.8 


27.3 






exd# exd 




.8 


3.0 






exd# x 




1.2 


4.5 






lit# x 




2.0 


8.6 






lit# bin 




1.6 


6.2 






lit# exd 




1.8 


7.0 






x # x # 


bi n 


1.8 


7.3 






x # bin# 




.85 


3.3 






exd# rpt# 


b i n 


.72 


2.8 






x » x # 


bin# 


b i n . 3 1 


1.2 






x # bin# 


x # 


bi n3 . 1 


12.0 


?? 


77 77 

* * i * 


Four accelerated 


s_ops : 










move display 


to di splay# 


9.8 






move lit to display 


9 


2.0 






move packed to packed 


3.0 






move lit to display 


nume r i c 


# 1.8 






would account for 


16.6% 


of all 


dynam i c a 1 1 y 


executed Cobol verbs. 




APPENDIX IV! COBOL ACCELERATORS 
OBJECTIVE 

To determine what if any components should be added to FHP 
.hardware to improve the performance of Cobol programs. 

BACKGROUND 

There is a possibility of providing operation acceleration 
features on FHP systems that can enhance their execution of Cobol 
programs. To decide what operations to accelerate we would like to know 
the relative frequencies of Cobol verbs and data types. Unfortunately 
there is a dearth of reliable information on this topic and we are 
reduced to applying liberal doses of intuition to what studies are 
a v a i 1 a b 1 e . 

There are 3 DYNAMIC studies which address this question: 

360/85 design study 

360 instruction frequency study 

STUDY Z dynamic/static Cobol verb study. 

The first two were done to characterize current 360 instruction 
execution frequencies, the last to study actual Cobol dynamics. 

One static study of 360 code generated by the DOS ANSI compiler 
for a DGC application program provided some surprises. Two other static 
studies are most noteworthy for their discrepancy with dynamic data: 

Guelph University study of university administrative 
programs 

STUDY Z static Cobol verb study. 

In STUDY Z the 6 dynamically most frequently occurring Cobol 
verbs , their static frequencies, the ratio of dynamic to static 
frequencies and comparison with IBM dynamic and guelph static 



frequencies are 


• 

• 














verb 


%dyn 


%stat 


dynZ/statZ 






Z 


IBM 


Z 


Guelph 






IF 


43 


12.5 


26 


14-30 


1.65 




MOVE 


26 


35 


33 


30-40 


.79 




GOTO 


12 


33 


20 


14-30 


.60 




ADD 


9.5 


4.5 


4.9 


2-3 


1.9 




MPY 


2.2 




.55 




4 




SUB 


2.1 




.5 7 




3.7 


The last col umn 


i ndi cates 


that the 


dynami c 


frequency of 


ope rat i ons 



typically from twice to 1/2 their static frequency and, therefore, 
that the ratio of two dynamic frequencies is from 4 times to 1/4 that 
of the ratio of their static frequencies. 

In the IBM studies the Cobol verbs have disappeared in the 360 
opcodes. At first we felt that we could isolate the "architectural 
overhead" instructions from the "substantive" ones. Examination of the 
code generated by the DOS ANSI compiler shakes that belief. We had 
guessed 30% to 40% overhead. In fact each Cobol verb is compiled into a 
STATIC average of_3 360 instructions. Unless the dynamically most 
frequent instructions compile into substantially fewer instructions than 
average we are faced with perhaps 50% to 60% overhead. (Interestingly 
one dynamic study shows that 7 of the most obviously substantive 
instructions dynamically account for 40% of all instructions. I was too 
shy to guess that this was in fact all the substantive instructions and 
that 60% rather than 30% were overhead. Unfortunately trying to induce 
the Cobol verbs which correspond to these substantive operations 
produces the very different dynamic frequencies in the above table.) 



OBSERVATIONS 




Despite these confusions there are some underlying simularities 
among all these most frequent substantive Cobol verbs: 

1. They address 2 streams of data being read from memory and 

compare them IF 

2. They address 2 streams of data* 1 being read and 1 being 

written, perhaps after a "trivial" transformation MOVE 

3. They address 2 streams of data being read from memory and 

"combine" them to produce a 3rd stream to be written 
to memory ADD SUB etc 

GOAL 

The goal of Cobol accelerators should be to allow these kinds of 
operations to proceed at memory-cache-JPDbus bandwidth. 

REQUIREMENTS 

To meet this goal we may need special purpose accelerators in 
the following areas : 

1 fetch can send 1 address to cache each cycle 

2 cache can send 1 JPDbus width of data each cycle 

3 execute can 

compare 
pack 
unpac k 

add, subtract 

one JPDbus width of data each cycle. IBM checks all such 
operations for valid data and optionally aborts on invalid 
data. To be comparable to IBM in this matter and meet our 
performance goals we may have to add special purpose checks. 

4 The Cobol standard defines several bizarre data formats that 
we must support. To do so in a reasonable fashion may require 
special decode ROMs for ASCII and EBCDIC separate and 
overpunch signs. 

Note that accelerators for functions i and 2 will also accelerate the 
operation of Fortran and SPL programs and of kernel functions like LAT. 



SUMMARY 

FHP hardware believes that this goal and these requirements to 
meet this goal are worth investment in special purpose hardware and will 
add such hardware as appears to be feasible to FHP systems, either in 
all systems or as special optional Cobol accelerator packages. FHP 
hardware solicits FHP software support, comment or correction of this 
posi t i on . 




